Managing Uncertain Data a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

نویسنده

  • Anish Das Sarma
چکیده

The ubiquity of uncertain data in modern-day applications (such as information extraction, data integration, sensor and RFID networks, and scientific experiments) has resulted in a growing need for techniques to deal with such data. This thesis addresses challenges in managing uncertain data in a principled, usable, and scalable fashion. We identify and explore a fundamental tension between usability and expressiveness in models for representing uncertain data. We propose a space of models for representing uncertain data, place the models in an expressiveness hierarchy, and study how the models relate to each other in terms of closure properties. We also address important problems of uniqueness testing, equivalence checking, minimization, and approximation in our space of models. For a representative model in our space (called URM), we study database design theory: We provide a sound and complete axiomatization of functional dependencies (FDs) for URM data, describe lossless decompositions, and give algorithms and complexity results for testing, finding, and inferring FDs. To address the usability-expressiveness tradeoff, we show that by adding lineage (provenance) to the URM model, we obtain a complete (intuitively, a fully expressive) data model, which we call the Uncertainty-Lineage Database (ULDB) model. We study properties of ULDBs including membership, extraction, and minimization. We develop techniques for query processing over ULDBs and show that lineage can be exploited for efficient confidence computation in ULDBs. Then, we present an extension to ULDBs that allows a seamless incorporation of data modifications and a lightweight versioning capability. Finally, we look at uncertain data management in the context of data integration. Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant up-front

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gaze-enhanced User Interface Design a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

........................................................................................................ iv Acknowledgments ..................................................................................... vi

متن کامل

Structuring Peer Interactions for Massive Scale Learning a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

....................................................................................................................... iv Acknowledgments ........................................................................................................ vi Table of

متن کامل

Haptics and Physical Simulation for Virtual Bone Surgery a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

......................................................................................................... iv Acknowledgments .......................................................................................... vi

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009